Open Data in Practice

Zachary Batist

McGill University
Epidemiology, Biostatistics and Occupational Health

October 28, 2025

How do researchers actually engage with open science practices, principles and tooling?


vs.

How advocates claim science works, and data’s role in their vision of scientific knowledge production

Open Science: a movement to make scientific research more accessible to scientists and society in general.


Open Data: the application of open science principles to data

About me

  • About me

  • What are data?

  • How open science tends to imagine data

  • Examples

  • Take-Aways

  • Questions?

Primary Interests

  • Collaborative research practices
  • How digital tools and infrastructures reconfigure knowledge production
  • Specifically:
    • Infrastructures that support data sharing
    • Open source communities of practice
    • The cultural and epistemic implications of research infrastructures and policy

  • How does open science (esp. open data) reconfigure — or attempt to reconfigure — collaborative experiences?
  • What values are associated with open data, and how do they intersect with scientists actual needs, desires and values?

Currently:

  • Technical and administrative systems that scaffold the Covid Immunity Task Force Databank

Previously (and ongoing, to a lesser extent):

  • Data management within archaeological projects
  • Archaeological data sharing, integration and reuse

Also:

  • Collaborative software development practices
  • Small-scale and community-driven data sharing initiatives

What are data?

  • About me

  • What are data?

  • How open science tends to imagine data

  • Examples

  • Take-Aways

  • Questions?

  1. Descriptive accounts of observed phenomenon

Cats in my neighbourhood:
Name Colour Sex Feral
Ellie Mix F Y
Bob Brown M N
Charlie Brown M N
Jasper Black M N
???? Tortie ? Y
Spencer White M N
  1. Descriptive accounts of observed phenomenon
  2. Evidence that forms the basis of a claim

Data were collected to document Ellie’s dietary preferences, by comparing the quantity of food consumed of various flavours.

Based on analysis of this data, we found that Ellie has a clear preference for chicken pate relative to all other options presented to her.

  1. Descriptive accounts of observed phenomenon
  2. Evidence that forms the basis of a claim
  3. Means of communicating observations from one set of circumstances to another

Dearest Colleagues,

I’m enclosing the data from my neighbourhood observations.

My observations were done between 2PM-6PM, during my Sunday strolls.

It includes cats I saw in ground-floor apartment windows, and cats I encountered on the street.

Note that I saw no outdoor cats on rainy days, and I skipped my walk last week.

Feel free to make use of this very important information as you see fit.

What are data?

  1. Descriptive accounts of observed phenomenon
  2. Evidence that forms the basis of a claim
  3. Means of communicating observations from one set of circumstances to another

What are data?

  1. Descriptive accounts of observed phenomenon
  2. Evidence that forms the basis of a claim
  3. Means of communicating observations from one set of circumstances to another

Implies:

  • Data are fixed and stable
  • Data are generated prior to and separate from analysis
  • Aura of authority, stability, and truthfulness.

What are data?

  1. Descriptive accounts of observed phenomenon
  2. Evidence that forms the basis of a claim
  3. Means of communicating observations from one set of circumstances to another

Implies:

  • Data are created under specific circumstances
  • Data are created for targeted purposes
  • Data are informed by partial background knowledge

How open science tends to imagine data

  • About me

  • What are data?

  • How open science tends to imagine data

  • Examples

  • Take-Aways

  • Questions?

The Open Data Imaginary

  • Non-material and non-political
  • Reflections of natural reality
  • Infinitely re-configurable

The Open Data Imaginary

  • Non-material and non-political
  • Reflections of natural reality
  • Infinitely re-configurable

Implies:

  • Spreadsheets, databases and articles are considered value-neutral representations
  • Belief that anyone can create and access data
    • BUT: Requires scientific resources, internet access, computational resources, requisite expertise to make sense of data

The Open Data Imaginary

  • Non-material and non-political
  • Reflections of natural reality
  • Infinitely re-configurable

Implies:

  • Belief that data are reflections of reality, rather than outcomes of decisions and actions
  • Data’s legitimacy considered to derive from their objectivity

The Open Data Imaginary

  • Non-material and non-political
  • Reflections of natural reality
  • Infinitely re-configurable

Implies:

  • Diverse data can click together to form new knowledge
  • More data ⟹ more possible configurations
  • Ignores the extreme challenges involved in making heterogeneous datasets compatible

Science is intrinsically material and positional.

Open science fails when it does not account for these things.

Examples

  • About me

  • What are data?

  • How open science tends to imagine data

  • Examples

  • Take-Aways

  • Questions?

Database of Obsidian Sourcing Studies (DObsiSS)

Different geological sources identified by different labs (as of 1998, ~27 years ago)

Data presented as prose, and mixed with analysis of non-chemical characteristics (2003)

dataARC

Graphic overview of data sources embedded in dataARC (https://data-arc.org)

Take-Aways

  • About me

  • What are data?

  • How open science tends to imagine data

  • Examples

  • Take-Aways

  • Questions?

Take-Aways

  • Open data as currently practiced encourages taking data at face-value
  • Most researchers intuitively understand data to be more complex than that
  • More valuable insights may be obtained by
    • Smaller-scale initiatives
    • Directed by specific objectives
    • With specific designated communities in mind

Questions?

  • About me

  • What are data?

  • How open science tends to imagine data

  • Examples

  • Take-Aways

  • Questions?